Utilizing Stacking for Feature Reduction in Graph-Based Genealogical Record Linkage
نویسندگان
چکیده
Genealogy research is centered on collecting records about an individual from various sources and combining the information to gain a larger historical perspective about that individual, commonly in the form of a pedigree. Data extraction, the internet, and other technological advancements have made large amounts of digital genealogical data more accessible. Discovering the relevancy of a digital record to a given pedigree involves determining if the individual described in the record is in actuality an individual within the pedigree. This process is called Genealogical Record Linkage (GRL). GRL can be automated through data mining and techniques by creating machine learned models from hand labeled comparisons. In this paper, we compare two such models-a tabular approach and a graph based stacking approach-and report the successful application of both on a large, post-blocking database. We also note the successful integration of these approaches in an open source distributed genealogy program that finds relevant machetes to a given pedigree from multiple online repositories.
منابع مشابه
Record Linkage for Genealogical Databases
In this paper we describe past experience and outline current directions in performing record linkage over large genealogical databases. 1. INTRODUCTION AND MOTIVATION Record linkage is the problem of identifying multiple records that refer to the same real-world entity. In genealogical databases, it is the problem of identifying when individuals situated in different pedigrees refer to the sam...
متن کاملProbabilistic Record Linkage for Genealogical Research
The most slow and tedious job in genealogical research is searching civil or church records for information about an individual. But, this is an essential step in research. By searching multiple sources such as census records, wills, deeds, birth and death records we can compile a more complete set of information, and potentially the pedigree of an individual. When records are stored electronic...
متن کاملGenealogical Record Linkage: Features for Automated Person Matching
This paper provides a high-level overview of how automatic person matching (genealogical record linkage) algorithms can be developed, and then provides a detailed explanation of many of the features used by FamilySearch in doing person matching. Empirical results show a dramatic improvement in accuracy by using these features trained with neural networks, when compared to traditional probabilis...
متن کاملEntity resolution in disjoint graphs: An application on genealogical data
Entity Resolution (ER) is the process of identifying references referring to the same entity from one or more data sources. In the ER process, most existing approaches exploit the content information of references, categorized as contentbased ER, or additionally consider linkage information among references, categorized as context-based ER. However, in new applications of ER, such as in the gen...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کامل